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Abstract 

It is common and convenient to treat distributed physical parameters as Gaussian random fields 
and model them in an "inverse procedure" using measurements of various properties of the fields. This 
article presents a general method for this problem based on a flexible parameterization device called 
"anchors" , which captures local or global features of the fields. A classification of all relevant data 
into two categories closely cooperates with the anchor concept to enable systematic use of datasets 
of different sources and disciplinary natures. In particular, nonlinearity in the "forward models" is 
handled automatically. Treatment of measurement and model errors is systematic and integral in the 
method; however the method is also suitable in the usual setting where one does not have reliable 
information about these errors. Compared to a state-space approach, the anchor parameterization renders 
the task in a parameter space of radically reduced dimension; consequently, easier and more rigorous 
statistical inference, interpretation, and sampling are possible. A procedure for deriving the posterior 
distribution of model parameters is presented. Based on Monte Carlo sampling and normal mixture 
approximation to high-dimensional densities, the procedure has generality and efficiency features that 
provide a basis for practical implementations of this computationally demanding inverse procedure. We 
emphasize distinguishing features of the method compared to state-space approaches and optimization- 
based ideas. Connections with existing methods in stochastic hydrogeology are discussed. The work is 
illustrated by a one-dimensional synthetic problem. 

Key words: anchored inversion, Gaussian process, ill-posedness, model error, state space, pilot point 
method, stochastic hydrogeology. 

1 Introduction 

The term "inverse problems" is extremely broad, and the Uterature is vast. Considered in this paper is one 
type of inverse problems, which concern inferring a spatially distributed, highly heterogeneous attribute from 
a finite number of relevant data. "Spatially distributed" , or "point referenced" , means that in principle the 
attribute is a function of the spatial location (geometric point). Relevant data include measurements of the 
attribute in question or covariates, and measurements of responses of the attribute field to certain forcings. 
Such problems arise in many situations, as shown in the following examples. 

Example 1. Yeh [1986] reviews techniques for the groundwater inverse problem. Groundwater flow and 
transport are controlled by hydraulic conductivity or transmissivity of the medium. These spatially dis- 
tributed attributes can be measured, with high uncertainty, at a small number of locations but can not be 
resolved throughout the spatial domain of interest. However, one can make observations of natural or experi- 
mental flow and transport processes to get head (water level), flux (water flow), and tracer concentrations at 
selected locations and times. The flow and transport processes are governed by partial differential equations. 
One's task is to infer the conductivity or transmissivity field given the above observations. 

Example 2. Bube and Burridge [1983] study a 1-D problem for exploration seismology. In this setting, 
an impulsive or vibrating load applied at the ground surface launches elastic waves into the earth's interior. 
Part of the wave energy is reflected and reaches back the ground surface, where it is monitored at many 
times. Mechanic attributes of the elastic medium that control wave propagation include density and Lame 
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constants; "effects" of the wave propagation include pressure and particle velocity, which are measured at 
the ground surface. The wave propagation is described by a hyperbolic equation system. The task is to 
recover the mechanic attributes, or transformations thereof, using the ground surface measurements. 

Example 3. Newsam and Enting [1988] investigate the problem of deducing the spatial distribution of 
sources (including sinks) of CO2 on the surface of the globe, given observations of surface concentrations of 
this non-reactive gas. The source distribution is linked to concentration by a transport model, governed by 
a diffusion equation. Although the transport model describes CO2 concentration in a spherical shell 12 km 
thick around the earth, all available concentration measurements are at the earth's surface, with time series 
records at some 20 locations around the globe. 

In these examples, the attribute of interest (hydraulic properties of flow media, mechanic properties of 
elastic media, and source/sink of gas) is connected with the measurements (head and flux, tracer concen- 
tration; pressure, particle velocity; and CO2 concentration) by a "forward process" (subsurface flow and 
transport, wave propagation, and atmospheric transport), which is usually embodied in a numerical model 
due to the spatial heterogeneity of the input attribute field. The attribute of interest is a function (or func- 
tional) of the spatial location, hence is of infinite dimension, therefore the inverse problem is ill-posed, or 
under-determined. For most practical purposes, we assume the field is discretized into a finite vector corre- 
sponding to a numerical grid. As computational power advances, people try to use finer and finer numerical 
grids in order to increase the reliability of the forward model, as an approximation to the actual process; 
as a result, the size of the model grid still far exceeds the number of measurements, hence the problem of 
inferring the attribute field (vector) remains seriously ill-posed. 

Because of the ill-posedness, infinitely many configurations of the attribute field, when fed into the 
forward model, may produce equally good match to the observations, yet some of these configurations 
may well contain features that are physically unrealistic. Perhaps the most commonly used solution to 
this problem is "regularization" [Tikhonov and Arsenin, 1977; Tenorio, 2001], which imposes subjective 
requirements on the structure (such as smoothness) of the attribute field. Usually, a "model performance" 
term, which indicates how closely a particular solution of the attribute field reproduces the measurements, 
is optimized in search for attribute fields that achieve a required level of performance, under the constraint 
of a regularization term. This approach is exemplified for groundwater inverse problems by Doherty [2003] ; 
Gomez-Herndnez et al. [1997]; Kowalsky et al. [2004]. 

In this class of inverse problems, regularization is essentially always used, explicitly or not, in the form of 
certain assumptions on the spatial structure, because brute force search for a "good" attribute field is both 
hopeless and meaningless. 

Another response to the ill-posedness of the problem is that it has become more or less a consensus 
to take a nondeterministic perspective, treating the attribute field as a stochastic process [Kitanidis, 1986; 
Rubin, 2003]. From this angle, a Bayesian interpretation is convenient and Monte Carlo sampling methods 
are indispensable [Sambridge and Mosegaard, 2002; Robert and Casella, 2005]. 

Besides ill-posedness, another main difficulty in this problem is nonlinearity in the forward model with 
respect to the attribute in question. The case with a linear forward model is largely solved by the Kalman 
filter [see an introduction by Welch and Bishop, 1995]. However, the vast majority of the forward models 
are nonlinear. The Kalman filter has been extended to cope with nonlinearity by "linearization" [Welch 
and Bishop, 1995]. This strategy in hydrogeology is represented by the "quasi-linear" method of Kitanidis 
[1995]. A direct attack at nonlinearity is the ensemble Kalman filter [Evensen, 2003] and variants. 

Most of the works mentioned above take a state-space approach, which treats the attribute vector on the 
numerical model grid as the model parameter vector, and outputs values of this vector (i.e. realizations of the 
attribute field) as the product of the inversion. It is generally desired to provide an ensemble of realizations, 
the spread of which present a measure of the uncertainty in the solution. Often, the algorithm for generating 
each ensemble member is identical, hence the computational cost is proportional to the size of the ensemble. 

This paper proposes a new statistical approach to this inverse problem. The idea treats the unknown 
attribute field as a Gaussian process and uses a basic property of the Gaussian process to achieve parsimonious 
yet fiexible parameterization of the field. In no attempt to make a comprehensive assessment at this point, we 
mention several distinguishing features of the proposed method, which shall be called "anchored inversion" 
in this paper. 
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The method is not a state-space approach. It describes the field with a relatively small number of param- 
eters, say several tens; this parameterization is very flexible, and in principle is separate from the resolution 
of the numerical model grid. This dramatic reduction in the parameter dimensionality has advantages in 
parameter inference, sampling, as well as statistical interpretation of the distributions of the model param- 
eters and field realizations. The ill-posedness issue of this relatively short parameter vector is much milder 
or even eliminated. 

Result of anchored inversion is some representation of the posterior distribution of the model parameters. 
Generation of field realizations is a subsequent, optional step based on the parameter distribution. Compu- 
tational cost of this step is essentially negligible. In other words, computational cost of the inverse procedure 
is not tied to the number of field realizations that are ultimately needed for prediction tasks. 

Anchored inversion is not centered at optimization. Optimization-based approaches typically places much 
emphasis on achieving certain level of model reproduction of the observed data; this is not the case with 
the proposed method. This distinction has important conceptual implications. It avoids the difficult tasks 
of determining relative weights for the performance and regularization terms, as well as relative weights for 
data components in the performance term. It also avoids the difficult task of determining a stopping rule for 
the optimization algorithm, and eliminates ad hoc statistical interpretations for realizations of the attribute 
field that terminate with difi^erent (or identical) values of an objective function. This, naturally, also saves 
much effort in developing an optimization algorithm. 

The proposed method is general with regard to nonlinearity of the forward model. Indeed, it views 
the linear case as a solved problem, and is mainly targeted at incorporating data from nonlinear forward 
processes. The method (as is presented here) assumes the forward model is deterministic, but otherwise 
arbitrary. 

Following this introduction, in Section 2 we present the formulation of the method. This section introduces 
the concept of "anchor parameters" or "anchors", which provides flexible parameterization of the fleld, 
especially for local features, in addition to conventional "structural parameters" . This section also describes 
a data classification scheme. Based on this scheme, all data are incorporated systematically, as is described 
in Section 3. The model inference presented in Section 3 treats model and data errors as an integral part. In 
Section 4 we present a sampling-based method for deriving the distribution of the model parameter vector. 
The method has promising efficiency features, and is a general idea in its own right. Section 5 presents a 
synthetic study of a one-dimensional problem, demonstrating model formulation and parameter inference. 

In Section 6 we discuss connections of anchored inversion to existing methods in stochastic hydrogeology. 
One emphasis is the popular "pilot point" method, which has similarities to anchored inversion in terms of 
parameterization. Another emphasis is the decomposition of error into measurement error and model error. 
This decomposition is conceptually and practically important, but appears to be lacking in the existing 
methods discussed. 

The paper concludes in Section 7 with a summary of contribution and future work. 

As for notation, bold symbols are used for vectors, and nonbold symbols for scalars. Matrices are 
upper-case bold letters. The scalar symbol x is used for a single spatial location, even if the space is 
multi-dimensional; the bold symbol x stands for a vector of locations. All vectors are column vectors. The 
concatenation of two vectors a and b is (a^ ,6^)"^ , where the superscript t denotes matrix transpose, but 
for simplicity we write (a, 6). We assume the field is discretized for modeling purposes; as a result we will 
be able to use matrix notations instead of integrals. Where the context permits, we use a slight notational 
abuse to distinguish functions by the independent variable. For example, p(0) and p(V') are not the same 
function evaluated at two values but are two different functions, of (j) and ^ respectively. 

2 Model formulation and data classification 

Denote the spatial random process by Y{x), where F e is a variable of interest indexed by the spatial 
coordinate x in 1-, 2- or 3-D space. We model the spatial variable y as a Gaussian process. In conventional 
geostatistical formulation, this process may be described by a "structural parameter" vector, 0, comprising 
two groups of parameters: parameters that define the expected value (mean function) of Y at any location, 
and parameters that describe the association (covariance) of Y's at different locations. In essence, this 



4 



formulation assumes 

Y{x)\e^N{^,{x),QY.Y), (1) 

where ix is the mean function and Q is the covariance matrix, both being functions of the location vector x 
and the structural parameters 6. The inverse methodology being proposed makes no assumption about the 
structural parameter 0] the specific form that is used to create the illustrations in this study is presented in 
Appendix B. We use x to denote the location vector of the model grid of the whole field, then Y = Y[x) is 
the field vector of Y values. 

If we know the value of a linear function of the field, say, 

^ = Hy, 

where ff is a known matrix and y is the vector of the Y values in the whole field, then a basic property of 
the normal distribution states (see Appendix A) that, conditional on this information, the process Y still 

has a normal distribution: 

Y{x) I {d, -d) ~ N{tJi{x) + Qy_^^ Qhy,hy - ^^(*)) ' ^^^^ - Qy,hyQhy,hy^hy,y) ' (2) 

where 

Qy.HY Q-y ^y ' ^ HY .HY HQy- Y''^ ' ^ HY .Y HQ-^ y ' ('^) 

The superscript t denotes matrix transpose, and Qy- y (or Q-y- y) denotes the covariance matrix between 

1^(0;) and Y (or between Y and Y{x)). 

If wc do not have such information as a linear function of the field, however, the handy property above 
prompts us to designate a (wisely chosen) linear function of the field as something special: 

and treat t9 as unknown parameters. (The matrix H above has no relation to the previous H.) We 
call these parameters "anchors". Hence our model for the spatial process Y consists of the parameter 
vector = {6,'d) and the user-specified matrix H. The anchor parameters introduce variations in the 
mean function and spatial correlation structure, as in (2), that are beyond the expressing capability of the 
structural parameters. One may say that the structural parameters capture global features, whereas the 
anchor parameters capture "local" (or additional global) features. One may also say that the structural 
parameters describe the field prior to knowledge of the anchor parameters. 

Given data z that relate to Y in some way, our goal in this study is to derive the conditional distribution 
of the model parameter vector, that is, p{& \ z). Once this distribution has been obtained, the distribution 
of the field in this model formulation is 

p{y I p{@ \z))= j p{y I 0) p{& \z)d@, (4) 

where p{y \ 0) is given in (2) (where x is replaced by x and Y replaced by Y). Note that the distribution (4) 
is not p(y \ z) , which would be / p{y \S,z) p{& \ z) d0. However, the main point of the proposed method is 
to make p{y \ p{& \ z)) in effect approximate p{y \ z) for the purpose of predicting a particular field function, 
say /(y), in the sense that / p{f \ 0)p(0 | z) d0 « p{f \ z). This approximation is achieved by the method 
as a whole and in particular by effective choice of the anchor parameters, rather than by any single step 
of mathematical manipidation. The field y \ p(0 | z) is conditioned on the data z indirectly via the model 
parameters 0. By giving up direct control over this data conditioning (one example of direct control is 
insisting on a specified level of data reproduction by simulated fields), the proposed method gains in rigorous 
parameter inference and tractable statistical analysis of the parameters as well as of the generated field 
realizations. These benefits will become clear in subsequent sections. We mention one benefit here: the 
distribution p{y \ 0) is the normal distribution (2) and is easy to sample from. In contrast, the distribution 
p{y I 0, z) is unknown and there is no easy way to sample from it. 

Now a natural question arises: where does H come from? Or, how do we define or choose anchor 
parameters? To answer this question (only partially in this paper) , we examine the various situations in the 
data-anchor relation and classify all on-site data into two categories based on this relation, as follows. 
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Type A data: data that evaluate a linear function of the field Y: 

Za. = H^y + ea, (5) 

where are errors with distribution p£^(ea). This data type includes two situations: data of point values of 
Y at certain locations, and data of linear functions of Y at multiple locations or the entire field. The values 

may be measured directly, or may be obtained through intermediate models of arbitrary complexity. 
The errors combine those from measurements and from any intermediate models, such as regressions and 
empirical relations. In the case where empirical relations give a nonlinear function of y at a single location, 
we transform the information to a linear function of Y , with the error distribution changed accordingly. 

Type B data: data that evaluate a nonlinear function of Y at multiple locations (e.g., in a spatial 
domain) : 

2b=X(y) + eb, (6) 

where is a known nonlinear function embodied in either analytical forms or numerical models. In the con- 
text of "inversion", Ai is called the "forward function" or "forward model". The errors Cb, with distribution 
P£j_,(eb), combine measurement errors and errors in the implementation of the function Ai (see Section 3.4). 
With type-A data, we always define corresponding anchors 

The values of these anchor parameters are directly provided by the data Zg_, possibly with uncertainty. In 
order to capture the information in type-B data, we define additional anchors 

where the linear function (i.e. matrix) Jfb is determined with or without reference to the forward function 
M.. We call i?a and i9b anchors of type A and B, respectively; or, more descriptively, the former "measured 
anchors" and the latter, "inverted anchors" . If an anchor is defined as the value of F at a specific location, 
we may call it an "anchor point"; otherwise an "anchor function". Collectively, we write i9 = (i9a,i9b) and 

The simplest way of defining type-B anchors is to choose unsampled locations {xj} and designate the 
variables {^(2;^)} as anchors. Generally speaking, we should choose locations {xi] that significantly increase 
the resolving power of the formulation (2) for the field. An interesting situation happens when the forward 
function Ai is close to a linear function of the field, say h{y). In that situation, we designate h{Y) (with 
a known form) as anchor parameters. By this designation, the data naturally are very informative about 
these anchor parameters, although they do not provide the parameter values directly. 

Further detail regarding the strategy of defining type-B anchors is beyond the scope of this article. It is 
worth mentioning that inverted anchor points are analogous to the "pilot points" in de Marsily et al. [1984] 
(see Section 6.1 for more discussion). In the context of the pilot point method, RamaRao and LaVenue 
[1995] and Gomez-Herndnez et al. [1997] discuss strategies for choosing locations as pilot points. 

In the groundwater example in Section 1, direct measurements of local-scale hydraulic conductivity 
or transmissivity (always with much uncertainty) are type-A data. Type-A data also include covariates 
such as grain-size distribution and core-support geophysical properties. Examples of type-B data are head 
measurements in well pumping tests and concentration measurements in tracer transport experiments. 

In the geophysical example in Section 1, type-A data are mechanic properties of the elastic medium, 
usually only available at the ground surface. Type-B data are recorded pressure and particle velocity at the 
ground surface. 

In the atmospheric example in Section 1, type-A data are direct monitoring of CO2 sources on the ground 
surface or covariates that provide estimates of CO2 source. Type-B data are CO2 concentration measured 
mainly on the ground surface. 

An important issue in type-A data is "change of support (or scale, or resolution)". For example, Merlin 
et al. [2005] develop a disaggregation method to use 40 km resolution satellite data, along with 1 km auxiliary 
data, in fine scale modeling of surface soil moisture. Bindlish and Barros [2002] describe a method to combine 
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two remote sensing datasets of 30 m and 200 m resolutions, respectively, in modeling surface soil moisture. 
Fuentes and Raftery [2005] develop methods to combine station monitoring and regional-scale model output 
data for air pollution; beneath the point-scale measurements and regional-scale predictions, both subject to 
error, is a random field defined in continuous space. 

3 Inference of the model parameters 

In this section we derive the parameter distribution conditional on all data, that is, p{@ \ z). The derivation 
contains the following three steps. 

First, we assign a prior for the parameter vector in the form 

p{@)^p{e)p{i}\e). (7) 

Because of the Gaussian assumption (1), the conditional specification p(')? | 0) is a normal distribution (given 
in Appendix A). Only p{6) requires to be specified by the user. The specification of this term is transparent 
to the anchored inversion methodology; the particular form we have used is given in Appendix B. 

Second, we derive the conditional distribution of the parameter vector given type-A data, that is, 
p{&\z^,p^J. 

Third, we update the preceding conditional distribution to be conditioned on type-B data, that is, derive 

P{®\Za,Pe^,Zb,PeJ. 

While the three steps are a coherent sequence, the last step is the main contribution of this study (besides 
the anchor formulation). 

3.1 Using type-A data 

Noticing that the type-A data z^, and Pe^ simply stipulate the likelihood of the type-A anchors i^a indepen- 
dently of other parameters, we have 

p{&\z,,p,J (xp{&)p,Sz,-i9,). (8) 

In the special case where the data z^, are free of error, that is, p^^ = S{0), the conditional distribution (8) 
is equivalent to i^a = and 

p(e,1?b|'»?a) CXp(0)p(l?a|e)p('»?b|e,1?a), (9) 

in which i9a — ^a- Both p(i?a | d) and p(i?b | d, '^n) are normal distributions (see Appendix A). 

Assume we have a way to sample the parameter distribution (9) (see Appendix B), then sampling the 
distribution (8) is simple: first sample a random error e* from p^^, then set i?a = -Za — Ca and sample {9, i?b) 
from (9). 

3.2 Using type-B data in addition to type-A data 

Via the Bayes theorem we have 

p{&\z^,p^^,Zi„PeJ (Xp{&\z^,p^J Xp{Zh,Pe-, \ @,Z^,p^J 

«P(0)Pea(2:a-l5a) Xp{zi,,p^^ \&) ^^Q^ 

= p{&)p^^{Zs^ - l^a) X J PM{zh - Cb I 0) P£i,(eb)deb. 

(The equality p{zh,pei, \ @,Zi^,pe.J — p[z\,,p^^ \ 0) holds because contains a specific value of i?a, which 
overrides Zg^ and Pe^.) The likelihood term PM{zh — ^h\®) connects the type-B data Z\^ to the model 
parameter 0. This likelihood is the probability density function of the random variable A^(y) | 0, evaluated 
at 2;b — Cb. The distribution of M.{y) |0 is determined by and AA. In general we do not have an 
analytical form for this distribution; neither is there a basis for assuming any convenient form for it. At this 
point, we make no assumption about the nature of the forward function M other than that it is known and 
deterministic. It can be, for example, a simple combination of multiple field functions that generate multiple 



7 



datasets of incomparable nature (such as travel time and chemical concentration) . To keep the methodology 
general, we do not expect to be able to calculate the likelihood pm (^b — Cb | ©) analytically. 

In principle, this likelihood can be estimated numerically and nonparametrically. The reason is that we 
can simulate the random variable A4{y) \ & by sampling y* from p{y \ 0) and then evaluating A4{y*). This 
simulation provides a random sample of M{y) \ 0, therefore the density function of this variable can be 
estimated nonparametrically. In particular, the density at — eb is the likelihood PM{^h — Cb | ®) in (10). 

Nonparametric density estimation is a large topic in statistical literature [Scott, 1992; Wand and Jones, 
1995]. The density in the targeted applications of this study is usually high-dimensional, because the 
dimensionality is the length of the data vector z\j. Estimation of high-dimensional densities is a notoriously 
difficult task, due to the so-called "curse of dimensionality" [see Scott, 1992, sec. 1.5 and chap. 7]. In 
Section 4, we propose a strategy that avoids evaluating the likehhood p{z\^,Pf:^, \ 0) altogether. 

If the data z-^ are free of error, then the integral in (10) reduces to pjvxi^h \ ©)■ 

3.3 Using type-B data without type-A data 

If all data are of type B, then all anchors are inverted ones. In this case, the posterior parameter distribution 
is 

p{e,-db\z-i,,Pe^) CXp(0,1?b) ■X.p{z^o,Pe^ | 0, l^b) 

(xp{e)p{'d\,\e) X y"p^(2;b-eb|0,i?b)Peb(eb)deb. 

This proceeds as explained in Section 3.2. A notable difficulty in this situation is the specification of a 
prior for the structural parameters 9. In the case where type-A data are available, the likelihood term 
p(i9a = Za^\6) provides much-needed regulation on the structural parameters such that one can afford to use 
a relatively naive prior for 0. One does not have this luxury in the absence of type-A data. To appreciate 
the complication in specifying a prior, see Kass and Wasserman [1996], and Scales and Tenorio [2001]. 

3.4 Measurement error and model error: why doing it this way 

In this subsection we answer two questions: (1) Why do we use a reduced parameter vector (0), instead of 
the state space vector (y)? (2) Why do not we know the likelihood function in (10) or assume one? 

Let y be a realization of the spatial field Y. In the modeling study, it is necessarily assumed that this 
vector is indeed a good representation of what is needed for predicting the forward process, A^Oj which is 
modeled (or approximated) by an analytical function or numerical code A4. The true outcome in the field 
that is faithfully described by the model input y is Mo{y), while our model predicts the outcome to be 
M{y). The difference, 

Cbi '^^^ M{y) ~ Mo{y), 

arising from the difference between A4 and A^o, is the error in the forward model. 

In real-world applications model error always exists. There are numerous sources for this error, including 
spatial and temporal discretization, inaccurate boundary and initial conditions, relevant physical factors 
that are omitted from consideration, deficiencies in the mathematical description of the physical process, 
and so on. Taken as a random variable, the model error vector ebi usually has inter-dependent components. 
Furthermore, the distribution of this error vector may be dependent on the input field y; imagine a realistic 
scenario where, as the input y deviates more severely from the actual field being studied, the model Ai 
becomes a less satisfactory representation of the actual process AJqj hence the model error tends to be 
larger. Considering these factors, we conclude that model errors are elusive, yet may not be "small" . 

A conceptually different issue is the discrepancy between the measurement of a quantity and its true 
value, written as 

£h2 -Mo{y), 

where ^b is the measurement. This measurement error arises from intrinsic inaccuracies in the measurement 
instrument and technique, uncertainties due to human operations, and other random factors. It is almost 
always acceptable to assume independence between components of a measurement error vector. It is also 
reasonable to assume that the measurement error is independent of model error and model parameters 
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(including model input y). It is often the case that one has a better idea about measurement error than 
about model error. 

Somewhat between model and measurement errors is error due to the so-called incommensurability, i.e. 
the fact that what is computed by the model and what is measured in the field arc not exactly the same 
quantity. A typical example is the case where a property is measured and modeled with different support (or 
resolution, or scale). In this discussion we include this error in the measurement error (eb2) for convenience. 

It is now clear that the discrepancy between data and model output is the combined contribution of 
model and measurement errors: 

eh = Zh- M{y) = (zb - Mo{y)) - {M{y) - Mo{y)) = €b2 - €bi- 

This is the error term in (6). In derivation (10) we need Petj the distribution of eb. Unfortunately, this 
distribution is unknown in most applications, thanks in no small part to the model error Cbi, which eludes 
reliable quantification and yet may be as large as or larger thanj;he measurement error €b2- 

If we took a state-space approach, treating the whole filed Y as the model parameter vector, then the 
likelihood of this parameter vector with respect to the data Zb and Pe^, would be 

Pei,{zh-M{y)\y). 

As is explained above, this likelihood function is unknown, even if we assume it docs not vary with the 
parameter vector y. The main complicating factor is the part of the difference Zb — ■M.{y) that stems from 
model error. 

The proposed method avoids this problem by taking a reduced parameter vector, 0. The likelihood is 
now 

j PM {Zh - Cb I ®) Pet (cb) deb 

as in (10). This can be re- written as 

PziZb\@), (11) 

where Pz{-) is the density function of the random variable M{y) + e^. Note the data Zb are the observation 
of this random variable, according to (6). A crucial point is that M.{y) is random given a specific 0, because 
the (discrctized) field y is random conditional on 0. While Ai{y) + Cb is the sum of two random variables, 
its distribution is approximately that of 7W(y) if we formulate the system such that variability of M{y)\@ 
dominates that of eb- In that case the likelihood (11) is well defined by the behavior of M.{y), permitting 
us to ignore the elusive but smaller eb- 

This formulation, by reducing the model parameter dimensionality so as to create more variability, places 
statistical inference of the model parameter(s) on solid ground. It eliminates ad hoc assessment of the model- 
data "mismatch" {A4{y) — Zb); meanwhile it is straightforward to accommodate model and measurement 
errors if their distributions are both known (which is unlikely). The statistical relation between model 
parameters and observations does not rely heavily on accurate quantification of model and measurement 
errors. The method works just fine, as it should, in the limiting case where both model and measurement 
errors vanish, which is the case in many synthetic studies. This issue is further discussed in Section 6.3. 

4 Sampling the posterior distribution 

It is generally impossible to get a closed form for the parameter distribution p{& \ z). Inference on this 
posterior involves two high-dimensional problems. The first is the dimensionality of the parameter vector 0. 
Usually there are below or close to 10 structural parameters including trend coefficients, variance, scale and 
smoothness of the correlation function, nugget, and anisotropy parameters. There is no hard limit on the 
number of anchor parameters; they can be several tens or more, depending on the amount of data and other 
considerations. (Here we only count the uncertain anchor parameters. Type-A anchors that are known 
exactly do not add to the complexity of the problem.) The second problem is the dimensionality of the 
likelihood (or density) function p(zb,Peb I ®)- This dimensionality is the length of the type-B data vector 
Zb, which we expect to number in the tens. We have mentioned that conceptually this likelihood can be 
estimated after generating a large sample of the random variable M.{y) | 0; however, density estimation in 
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such high dimensions is only next to impossible [Scott and Wand, 1991], noticing the high cost in running 
the forward model A4. 

The high dimensionality of the model parameter vector calls for a Monte Carlo approach, whereby the 
posterior distribution is represented (i.e. approximated) by a discrete sample from it. A popular method for 
this task is Markov chain Monte Carlo (MCMC). In this study we do not use MCMC, mainly because the 
likelihood function is unknown. Nevertheless, a sampling-based method is the only option. 

Since density estimation in more than, say, five dimensions is practically hopeless, a possible route is to 
use rough density estimation as a guide for the sampling, which we have explored with mixed performance. A 
related idea is a variant of MCMC called "approximate Bayesian computation (ABC)" [see Marjoram et ai, 
2003]. We mention two challenges to the ABC method. First, one has to deal with all the nontrivial technical 
issues in a MCMC procedure, including burn-in, convergence, and so on. Second, ABC uses a measure of the 
simulation-observation mismatch in lieu of likelihood. For multivariate data, there are significant difficulties 
in how to define this measure. 

The procedure we propose here is based on the following (simple in hindsight) observations. (1) From 
p{@ I Zg) andp(A^(y) | 0) we have a joint distribution of (0, A^(y)) | z^; the target distribution p(0 | Za, ^b) 
is a conditional of this joint distribution at A4{y) — Zy^. (2) If p(0, A^(y)) is normal, then the conditional is 
another normal distribution, known analytically. (3) A normal mixture is very flexible and could approximate 
very complicated distributions, such as p(^&, Ai{y)) [Marron and Wand, 1992; West, 1993; Givens and 
Raftery, 1996; McLachlan, 2000]. 

Let {{&i,Wi)}f^i be a sample of ] with weights Wi. With the field parameterization and prior 
specification presented in Appendix B, it is straightforward to get a random sample of ] ^ai hence the 
weights are all l/n. In general, we allow the sample to be obtained by any sophisticated algorithm with 
proper weights; this is practical because it does not require evaluating the expensive forward function M. 
Subsequently, randomly draw iji from p(y \ &i) and evaluate M{yi), for i = 1,. . . ,n. We thus obtain a 
sample of the random vector (0, A^(y)) j^a, denoted by ^ below, with weights {wi}. The distribution of ^ 
can be approximated by a normal mixture in the usual setting of kernel density estimation [sec introduction 
by Sheather, 2004, and references therein]: 

n 

p(ll^a) = ^^«.<^U;l.,/i.S,), (12) 

where (/){■; fj,, S) is the normal density function with mean fi and covariance matrix S, and hi is a scaling 
factor ( "bandwidth" , taken to be 1 here) . This approximation places a Gaussian kernel at every sample 
point ^i, with a covariance matrix taken to be the sample covariance of, say, the 500 nearest neighbors 
of in the sample, in the Mahalanobis sense. 

Given the normal mixture distribution (12) for ^, and as the observed value of A4{y), the conditional 



distribution of is another normal mixture. Take the partition — vi^^ ' ' then (see West [1993] 

[•^21,4 ^22, ij 

and Appendix A) 

n 

p(01za,2b) =^«4 0(0;m„V,), (13) 

i=l 

where Vi cx Wi (f>(^Zb; A4{yi), S22.i), normalized to sum to unit, and 

m, = 0, + ^l2.^^22A^h -M{y.,)), V, = Ell,, - Si2,iS-^S21,i. 

Uncertainty in the data Zg_ may be accommodated while sampling 0, as described in Section 3.1. Un- 
certainty in ^b may be accommodated by perturbing the sampled Miy) according to a specification of the 
error distribution, p^j^ . 

Because the posterior distribution of the model parameters is a normal mixture, various statistics of the 
posterior can be calculated analytically, without resorting to sampling the posterior. 

To make the normal approximation defensible, all components of the variable ^ should be defined on 
(—00,00). In other words, the procedure above does not derive the posterior distribution of the parameter 
vector 0, but of a transform of it. Among the most useful transformations are the log transformation for 
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a positive variable and the logit transformation for a doubly bounded variable, f{x) = log^^^^^- For 
convenience of discussion, transformations to the forward process outcome can be made part of the forward 
function M. The anchor parameters do not need a transformation since they are normal variables by 
assumption or by definition. 

Usually, evaluating the forward function M. (often requiring running a numerical model) is by far the 
most expensive operation in the inversion. In the procedure above, the forward function is evaluated exactly 
once for each value ever examined for the model parameter vector; this happens when we evaluate M.{yi) 
for the field y, drawn from p{y\ The conditioning operation in (13) forcefully brings information in 

the observation into the statistical structure of the system that has been revealed through sampling and 
simulation. These features contribute to the efficiency of the procedure. 

A challenge in this procedure is that the weights {vi} tend to be very skewed, that is, in a mixture of 
thousands of components, possibly only a few have significant weights. This is a consequence of the high 
dimensionality of ^, or 0. 

5 Illustrations 

We use a one-dimensional synthetic problem to illustrate the method. The true 1-D field is shown in Figure 1. 
This is actually the elevation data along some transect in the mountains to the east of the city of Berkeley, 
California, USA. The discretized profile is of length 80; the elevation is in meters. We shall work in this 
discretized field of 80 locations, and treat the elevation as representing some positive-by-definition physical 
property {Y in its natural unit) that we need to infer. Also marked out in Figure 1 are 5 locations where 
we have direct measurement of y (i.e. type-A data), and 15 locations chosen as "inverted anchors". We did 
not define anchor functions for this example: all anchor parameters are anchor points. 

We constructed two synthetic "forward" processes, both described by the following differential equation: 

where s is a source term, z is the forward process outcome controlled by the field Y and the source s as well 
as boundary conditions. All of y, z, and s are functions of location x. The first forward process has s = 
everywhere, z{l) = 1, and ^(80) = 0. The actual z resulted from this process in the true field is shown in 
Figure 2, along with 9 measurements. The second forward process has s(20) = 4000, s(53) = 2000, s = 
elsewhere, z(l) = 100, and z(80) = 300. The actual z along with 6 measurements are shown in Figure 3. 
Note that we have intentionally made the two processes have very different magnitudes. The forward model 
M in this setting simply solves the differential equation separately with the two sets of conditions, and 
combines the results to output a 15-vector. The 15 measurements are type-B data Zi,, whereas the 5 direct 
measurements are type-A data Za- All data were assumed to be error free. 

We treated the unknown field as a Gaussian process, assuming a globally constant mean /3, an exponential 
covariance with scale parameter cp, variance rj'^, and zero nugget. Therefore the model parameter vector 
includes 3 structural parameters and 15 unknown type-B anchor parameters. (Another 5 type-A anchor 
parameters are known.) The posterior of was approximated by a normal mixtiire of four sizes (the sample 
size n in Section 4): 5000, 10000, 20000, and 40000, referred to as scenarios ABl, AB2, AB3, and AB4, 
respectively. Recall that the computational cost is proportional to the sample size. The posterior distribution 
of the structural parameters was also inferenced using type-A data only (see Appendix B). This posterior 
distribution was used along with the type-A data to conduct simulations, referred to as scenario A, providing 
a reference for the AB scenarios. 

While using the sampling algorithm described in Section 4, we transformed relevant variables as follows. 
Scale parameter (p: logit transformation on (5,80), i.e. log ggZy ■ Variance parameter rj^: log transforma- 
tion. Mean /3: no transformation. Forward model outcome: no transformation. The inference of posterior 
described in Section 4 is really for these transformed parameters. In addition, we took Y to be the logit 
transformation on (1.7, 10249) of the elevation data. (The value range of the elevation data is [17, 1024.9].) 
For convenience we say Y is in "model unit" whereas the positive field being modeled is in "natural unit". 
Note the forward model (14) takes input field in its natural unit. 

The prior specification for the structural parameters G and the method for sampling the model parameters 
conditional on type-A data are described in Appendix B. 
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Figure 4 shows the posterior density of the scale parameter, calculated from the normal mixture approx- 
imation. The posterior of scenario AB4 does not continue the trend of decreasing scale in scenarios ABl-3, 
suggesting some degree of convergence. Because of the interaction between the scale and variance parameters 
[see Diggle and Ribeiro Jr., 2007, sec. 5.2.5], different value combinations of these two parameters may have 
comparable model fitting performance. This parameter interaction is clear in Figure 5, which shows the joint 
posterior of this pair of parameters. The variance parameter tends to be much smaller in scenarios AB3-4 
than ABl-2. 

For each AB scenario, we randomly drew 1000 values for the model parameter vector from the poste- 
rior distribution (i.e. the normal mixture approximation), and simulated one field realization based on each 
parameter value. A prediction with the forward model was obtained with each simulated realization. Corre- 
sponding sampling and simulations were also done with scenario A for comparison. These results are shown 
in Figures 6-10. 

The distribution of the sample of inverted anchors are shown in Figures 6-7, with comparisons to the true 
anchor values and to scenario A. The AB scenarios are a clear improvement over scenario A. The scenario 
AB4 has reduced bias and variance compared to ABl. Comparison of Figures 6 and 7 reveals two problems. 
First, the deviation of the simulated anchor values from their true values is amplified in the natural unit. 
Second, in natural unit, the posterior distribution of anchor values has a long tail upward. If we were 
confident enough to impose a tighter range on the modeled field values (we allowed it to be up to 10 times 
the maximum value of the true field) , both problems would be alleviated. Figures 8-9 show ranges of the 
simulated fields and convey similar messages. 

It is informative to examine how the simulated fields reproduce the type-B data. According to Figure 10, 
the reproduction in scenarios AB2-4 is much better than in scenario A from a statistical perspective. Note 
that the two groups of data components (from two forward processes) have drastically different magnitudes 
and this fact is transparent to the proposed inversion method. 

Since the data are error-free, the non-exactness of the reproduction is solely a consequence of the ran- 
domness in the model parameters and field simulations. The "sharpness" of reproduction has to do with how 
sensitive each datum is to the model formulation. Compared to optimization-based methods that impose a 
specified tolerance on the data reproduction, the observation-simulation mismatch in the proposed method 
is unbounded in theory. This should not be viewed as a weakness of the method; an analogous situation 
is the Gaussian assumption on measurement errors. In Figures 6-9, only two of the four AB scenarios are 
shown due to space restrictions. We intentionally showed scenario ABl, whose reproduction of type-B data 
according to Figure 10 is not consistently better than scenario A. However, the comparison of scenarios ABl 
and A in Figures 6-9 makes the point that there is much to examine than the reproduction of type-B data. 

6 Comparisons with other methods 

In this section we draw comparisons with some literature in stochastic hydrogeology. Emphasis is on con- 
ceptual points; computational alternatives such as MCMC and Kalman filters are not discussed. 

6.1 The pilot point method 

The Pilot Point (PP) method [de Marsily et al, 1984; RamaRao and LaVenue, 1995; Gomez-Herndnez et al., 
1997; Doherty, 2003; Kowalsky et ai, 2004] is a (generalized) least-square estimator of the field y defined as 
the minimizer of the following objective function: 

J = {M{y) - z^fw-\M{y) - z^,) + {y - fJ.fQ^]^{y - m), (15) 

where is a weight matrix, Qy. y is the covariance between Y according to a spatial covariance func- 

tion, and /X is a known reference (or prior mean) value of the field y. Both Q and fi are determined by 
pre-specified structural parameters 6. This is the common "regularized data reproduction" formulation for 
inverse problems. The optimization algorithm begins with an initial field yo drawn from the normal distri- 
bution Af{n, Qy y) ^^"^ modifies the field until J falls below a prescribed threshold. Optimizing realizations 
reached from different initial fields are usually considered "simulations" , which present some idea about the 
uncertainty in the estimator. 
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Because optimizing the field vector y directly would be mostly infeasible, a number of locations in the 
field are chosen as "pilot points", and the Y values at these locations are used to guide the optimization 
algorithm. In so doing, the field is actually 



where y* = {y'^,y*) is a vector of (fixed) direct measurements y^ and (tunable) pilot point values y*, and 
X* is the location vector of direct measurements and pilot points. The algorithm thus reduces the dimension 
of optimization from the size of the model grid to the number of pilot points in the context of a specific 
initial field yo and known structural parameters 6. In exchange for this dimension reduction, the procedure 
fills in the Y values at locations other than x* based on the assumed spatial correlation {Q{- \ 6)). 

Some studies make the choice of pilot points an internal matter of each optimization run [RamaRao and 
LaVenue, 1995], i.e. contingent upon each particular yp! others use the same set of pilot points across all 
optimization runs [Gomez-Herndnez et al, 1997]. In the former case, the pilot points are purely an assistant 
to the optimization algorithm. In the latter, the pilot points permit a higher-level interpretation as model 
parameters. 

In a variant to the objective function in (15), the regularization term may be written in terms of the 
pilot point values instead of the entire field vector. The regularization term may also be omitted, in which 
case the regularization is implicit in the procedure. 

We shall make three observations on the PP method. 

(1) PP is a state-space approach, in which the model parameter vector is the entire field y, which 
is obtained by optimization, while the structural parameters 6 and pilot points (a;*,!/*) are intermediate 
devices to help the optimization procedure. Due to the nature of optimization, the generated fields are not 
random samples from a well defined target distribution. As a result, if one views the ensemble of realizations 
created in PP as "simulations" , it is unclear what distribution is being simulated from. This distribution is 
collectively determined by all aspects of the optimization procedure. Zhang [2009, "An analysis of the pilot 
point method in stochastic hydrogeology" , in preparation] provides an analysis of this distribution. 

(2) The method requires the structural parameters 9 to be pre-specified, and has no provisions for 
systematically updating these parameters in light of the data Zy^. This inability is because these parameters 
are used in creating yo ^-^id in updating the field (following modifications to the pilot point values) in the 
optimization algorithm according to (16). This is despite the claim of Kowalsky et al. [2004] that they can, 
but do not, optimize over the structural parameters the same way as they do the pilot point values. 

(3) Creating each realization goes through the same optimization procedure, therefore computational 
cost of PP is proportional to the number of realizations requested. 

The proposed method of anchored inversion is a clear contrast to the PP in all these three aspects. 
In anchored inversion, the model parameters are 0, which contains conventional geostatistical structural 
parameters and anchor parameters; the field y is formulated as having a normal distribution described by 
0. The distribution of conditional on all data is derived. Given the posterior distribution of 0, creating 
realizations of y requires negligible additional computation. 

In a PP procedure, the pilot points may well be generalized to be linear functions of the field, iJy, in 
exactly the same way as in anchored inversion. In fact, we expect this generalization to enhance the efficiency 
of the optimization if the linear functions are well chosen. The difference between anchored inversion and 
the PP method is not in this part of the parameterization, but in how this parameterization is used. 

6.2 Two methods that estimate the mean and covariance of the field conditional on the data 

The "quasi-linear" method [Kitanidis, 1995; Nowak and Cirpka, 2004] essentially parameterizes the field Y 
as a normal distribution JV{fj,, S) with 



y = yo + QY.Y*QY^',Y'{y* -yoix*)), 



(16) 



-1 



(^b-X(O) 



HY.HY 



S — Qy y ^ 



Y.HyQ HY,HY^HY,Y 



This formulation has two parameter vectors: the distributed parameters ^ and the structural parameters 6. 
The Q's are covariance matrices defined by 0, analogous to (3). The coefficient matrix H is the sensitivity 
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matrix of the forward process M with regard to the mean field fi: 

dM{y) 



H{iJi) , , 
dy 



V=IJ- 



This definition makes it necessary that the mean field /x (or the parameters ^ and 9) be estimated in an 
iterative procedure, updating H as the estimate of /i. changes. The parameters ^ and (hence /x and 
S) are estimated by optimizing an objective function that involves data reproduction, regularization, and 
functions of the structural parameters. The linearization A^(/x + A) « + H{fi)A is instrumental in 

the optimization procedure. 

After the estimates of /x and S have been obtained, Kitanidis [1995] proposes to generate realizations 
through an optimization procedure with an objective function similar to (15) (in which Q is replaced by S). 
We suspect this extra optimization is unnecessary and would choose to sample from the normal distribution 
A/'(/i, S) directly. In fact, Kitanidis [1995] shows that the extra optimization makes little difference compared 
to direct sampling. 

Hernandez et al. [2006] use the pilot point parameterization to describe the field as follows: 

y = Xf3 + Qy y'Qy' .y' (y* - ^-*'^)' 

where X and X^^* are matrices of covariates ("design matrices") for locations x and x*, respectively, and 
the other notations are analogous to those in (16). The matrices Qyr y" ^^'^ Qy'X" determined by 
covariance parameters 9. The parameters of this formulation have three components: trend coefficients 
f3, covariance parameters 9, and pilot point values y* . An objective function analogous to that of the first 
optimization in the quasi-linear method is defined, involving data reproduction, regularization, and functions 
of the covariance parameters. For each of a finite set of (/3, 9) values, the objective function is optimized 
with respect to the pilot point values. The best (/3, 9) value is chosen using empirical criteria and the 
terminating values of the objective function. The corresponding terminating pilot point values are taken as 
estimates of y* . Subsequently, the conditional mean of the field is as in the formula above with /3, 9, and y* 
taking their estimated values, and the conditional covariance is taken to be Q-y y — Qy- ^.Qyl y^Qy* 
Subsequent generation of field realizations entails random draws from the normal distribution defined by 
these conditional mean and covariance matrix. 

These two methods share with anchored inversion two important advantages over the pilot point method. 
First, the structural parameters are estimated taking into account the data Z\^. Second, generation of 
realizations from the estimated normal distribution is essentially free (if the quasi-linear procedure foregoes 
the second optimization step). 

The pilot points in Hernandez et al. [2006] are equivalent to inverted anchor points in anchored inversion. 
As with the pilot point method, the pilot point parameterization may be generalized to any linear functions 
of the field, just as in anchored inversion. The quasi-linear method has the effect of defining anchors H{fj,)Y . 
This implicit definition differs from anchored inversion in that (1) the number of these anchors equals the 
length of the data vector ^b, ruling out the option to use more or fewer anchors; (2) although these anchors 
are very effective, they are unknown until the mean field fi has been estimated. 

In these two methods, the mean and variance of the Gaussian model for the field have one set of optimal 
estimates. This differs fundamentally from anchored inversion, in which the Gaussian model parameters 
follow a multivariate posterior distribution. 



6.3 Difficulties in a common formulation of the likelihood with respect to type-B data 

The pilot point method and other methods discussed above make use of the construct 

{M{y)-z^fw-\M{y)~z^), 

which is legitimate for least squares estimation or optimization as long as the matrix W is positive definite. 
In some studies in the literature of stochastic hydrogeology [McLaughlin and Townley, 1996; Kowalsky et al., 
2004; Li et al., 2007], this construct is expanded to a normal "likelihood" of y with respect to jZb: 

(27r)-"/2|Wri/2exp{_i(X(y) _ z^)'^W-\M{y) - z^)}, (17) 
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where n is the length of the data vector z^. We notice that this interpretation has two requirements. First, 
M{y) — z\j is the observed value of a random variable conditional on the model parameters, which is y 
here. Second, this random variable indeed has a normal distribution with mean zero and covariance W. We 
have established in Section 3.4 that this random variable is ebi — eb2, where ebi is model error and eb2 is 
measurement error. We shall examine the two requirements in view of this fact. 

The first requirement suggests that errors, either in measurement or in model or in both, must exist for 
this formulation to be meaningful. This has un-natural effects in synthetic studies. In probably all synthetic 
studies that we have read, model error is absent, i.e., the same forward computer code is used for generating 
synthetic data and in the inversion operations. This necessitates introducing noise to the synthetic data 
in order to use the likelihood formulation above. The known covariance matrix of the synthetic noise, 
W, is critical information for the construction of (17), and this covariance is not subject to "estimation". 
In Hernandez et at [2006], the "measurements are taken to be error free" (p. 12) yet a variance of the 
measurement error is estimated. On another occasion in that article (p. 7), random noise is added to the 
synthetic data and the estimated error variance is close to the true variance of the added noise. However, 
having this quantity subject to estimation in data fitting is questionable in the first place (if one is to maintain 
a likelihood interpretation), and it is unclear why its estimate turns out to be (in their method), or should 
be, close to the truth. Similar inconsistencies have appeared in other studies. 

Now comes the second requirement: how to specify the matrix W. Despite the fact that the random 
variable involved is the combination of measurement and model errors, some studies have called it "measure- 
ment error" . This nomenclature can mislead the user to specify its magnitude (or distribution) with only 
the measurement mechanism in mind and assume independence between its components. Li et al. [2007] call 
it "epistemic error" and include in it error in intermediate models that are used to obtain the observation 
^b. This is an improved view over purely "measurement" error, but still ignores error in the forward model. 

The biggest concern over the formulation (17) is that the error covariance matrix (W) is hardly known 
to any high level of accuracy (see Section 3.4 for complications in the model error), yet it is a center piece in 
the evaluation of the likelihood. The likelihood becomes more and more sensitive to the specification of W 
as the data and model quality increases. In the limit where both the forward model and the measurement 
are free of error, which is perfectly legitimate in synthetic studies, the formulation simply fails. 

This awkward situation has been noticed in Kitanidis [1995], which observes that the quasi-linear method 
proposed in that article performs poorly when the measurements "have very small observation errors" 
(p. 2418). 

There have been mentions to this issue in the literature. Lee et al. [2002] recognize the model error 
component, and find it a delicate task to specify W . They take an empirical approach to this specification. 
Also see the references cited by Lee et al. [2002]. Scales and Tenorio [2001] clearly state that the random 
variable behind M{y) — zy, is the sum of measurement error and model error. Questions are also raised by 
Ginn and Cushman [1990, sec. 3.1] regarding the likelihood formulation (17). Tarantola and Valette [1982] 
make it one of the minimum requirements that the formulation of an inverse problem "must be general 
enough to incorporate theoretical errors in a natural way." The "theoretical error" in their terminology is 
equivalent to the "model error" here. They give a telling example that, "in seismology, the theoretical error 
made by solving the forward travel time problem is often one order of magnitude larger than the experimental 
error of reading the arrival time on a seismogram." 

As is explained in Section 3.4, the proposed anchored inversion avoids this conceptual (and consequently 
practical) difficulty by making the relation between observations (zb) and model parameters (0) a well 
defined statistical one, regardless of the existence or absence of "errors" . It is our hope that the preceding 
paragraphs have clarified a long, widely overlooked issue, and have suggested feasible directions for effort: 
either quantify the model error, or render it negligible. We believe it is very important in applications to 
recognize that the forward model is almost never perfect, yet method testing in synthetic settings often 
assume the model is perfect. 
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7 Summary 

7.1 Contribution of the paper 

We proposed a general method, named "anchored inversion" , for modehng spatial Gaussian processes (ran- 
dom fields) with an emphasis on assimilating various kinds of data in a systematic procedure. The method is 
based on a new parameterization device named "anchors" and a data classification scheme. Anchors relate 
to linear functions of the field. This device is useful due to a basic property of Gaussian processes, namely, 
a Gaussian process conditional on a linear function of the field is still a Gaussian process, with conditional 
mean and (co)variance known analytically. This makes anchors viable parameters for describing the random 
field and providing a powerful mechanism to introduce (or impose) flexible global and local features. The 
data classification scheme is based on an abstraction of the data-unknown relation. Since the usage of a 
particular dataset is based on its position in this classification, data of radically different natures can be used 
simultaneously as long as reliable (numerical) models exist for the data-generation processes. The method 
is blind to disciplinary details and technical implementations of the data-generation processes. 

According to the data classification, a dataset is of either type- A or type-B. Informally speaking, type- A 
data are linear functions of the field, whereas type-B data are nonlinear functions of the field. The inverse 
procedure transfers information in nonlinear data into the form of linear data, thus transforms an unfamiliar 
problem to a familiar one. The information transfer is in a statistical sense; it can not be complete or exact. 
The benefit is rigor and relative ease in statistical inference as well as analysis of the results. 

The formulation of the method leads to a radical reduction in the dimensionality of the model parameters 
in comparison with the common state-space approach that uses the entire spatial field as a model parameter 
vector. Benefit of this dimension reduction is, again, rigor and relative ease in statistical inference and result 
analysis. 

The extremely high dimensionality of model parameters (in an state-space approach) has long troubled 
modelers. This contributes to the "identifiability" or "ill-posedness" issue, that is, there is no unique, best 
solution to the problem; many solutions are equally good by one's chosen criterion. This phenomenon is 
termed "equifinality" by Beven [2006] , who identifies as an important future research area "how to use model 
dimensionality reduction to reduce the potential for equifinality." The ill-posedness implies that, if tackled 
from a statistical perspective, the model parameter vector has no maximum likelihood estimate [O'SuUivan, 
1986]. With radically reduced parameter dimensionality in the proposed anchored inversion, the problem 
could become well-posed in terms of the much shorter parameter vector. 

Moreover, compared to a state-space approach, one has complete control over the number and choice of 
anchor parameters. The parameter dimensionality is no longer "hijacked" by the numerical model grid, and 
the resolution of the model grid is no long restricted by difficulties with parameter dimensionality. These 
two things, which are better to be separable, are now separate. 

Anchored inversion is not centered at optimization. This has conceptual as well as computational im- 
plications. Main conceptual advantages are (1) statistical interpretations of the result (see comments in 
Section 6), and (2) avoidance of difficult technical issues including defining an objective function (especially 
relative weights of different data components and terms) and choosing a termination criterion. The main 
computational advantage is that the cost of generating multiple field realizations by anchored inversion is 
not determined by the cost of an optimization algorithm that is performed for each realization. 

We proposed a sampling-based algorithm that approximates the parameter posterior distribution by 
a normal mixture. The algorithm requires evaluation of the forward model exactly once for each value 
examined for the model parameter vector. The algorithm is quite general and is well suited to the problem 
of conditioning on nonlinear data. After inference of the model parameters, realizations of the spatial field 
can be created at negligible computational cost. 

The proposed anchored inversion has a modular structure that is worth noting. The method makes little 
assumption about details of the structural parameters and their prior specification. The choice of anchor 
parameters is a separate task, as is development of the forward model. Even the sampling method described 
in Section 4 is a modular component, although we expect the ideas of normal mixture approximation and 
conditioning to remain important efficiency features in future refinement of this computationally demanding 
component. 
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We commented on connections of anchored inversion to existing methods in stochastic hydrogeology. 
Special emphasis was placed on a conceptual issue that involves treatment of measurement and model 
errors. We pointed out deficiencies related to this issue in some state-space approaches, and explained how 
anchored inversion tackles these difficulties. 



7.2 Future work 

The most obvious omission in this paper is how to choose anchor parameters, including the number of these 
parameters and the exact definition of each. In the case of anchor points, the definition means the locations 
of anchors. 

The proposed algorithm for sampling the parameter posterior needs to be refined, possibly both in 
depth and in breadth. In depth, it may be possible to develop an iterative procedure to achieve sequential 
refinement. In breadth, it may be useful to run multiple independent "threads" of the algorithm in parallel, 
comparing/merging results and monitoring for convergence. Technical details of the normal mixture (kernel 
density estimation), in particular determination of bandwidth (the scaling factor h in (12)) need explorations 
[see Jones et at, 1996]. 

The proposed method makes little assumption about the parameterization of the random field by struc- 
tural parameters 0; neither does not restrict the specification of a prior for 6. We illustrated with a simple 
formulation (Appendix B). A particular application may warrant more sophisticated formulations such as 
anisotropy and flexible spatial correlation models, especially the Matern model [Stein, 1999; Banerjee and 
Gelfand, 2003; Marchant and Lark, 2007; Zhang et ai, 2007]. 

Appendix A: properties of the multivariate normal distribution 

The following are standard results; see, e.g., Johnson and Wichern [1998, chap 4]. 
If X ~ A^(At,S), then 

HX - N{Hfj,,H'EH^). 



(A-1) 



Let X = 



X2 



id S = 



S21 S22 



be distributed as A^(/i, S) with jju = 
given X2 = X2, the conditional distribution of Xi is 

Xi I 032 - N{tll + Si2£22'(a=2 - /^2), Sn - i:i2S22'S2i 

If we have X - N{fi, S) and Y = HX, then from (A-1), 



and IS22I > 0. Then 



(A-2) 
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■ s 
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7 





(A-3) 



Given Y = y, then (A-2) and (A-3) suggest the following conditional distribution, which is convenient for 
the current study: 



X\y N{ti + 'EH^{Hi:H^) \y - Hfi), T, - SH^ {HSH^) ^ HS) 



(A-4) 



Appendix B: structural parameters, prior specification, posterior with respect 
to direct measurements, and sampling 

We model the spatial process Y as the composition of two components: a mean function describing the 
expected value of Y{x), and a stochastic process describing fluctuations around the mean. The random 
vector Y{x) is assumed to have a multinormal distribution: 

Y{x) ^ N{Xj3,rfR), (B-1) 

where X is a "design matrix" of known covariates (with a leading column of I's), /3 is a vector of trend 
coefficients, rj^ is the variance, and R is the correlation matrix of Y at locations x. The correlation between 
y's at locations xi and X2 is modeled as p{xi, X2', 0) with parameter vector 0. The vector usually contains 
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components for the range of correlation, smoothness of the spatial process, nugget effects, and anisotropy. 
In this formulation, the structural parameter vector is 9 = {4>, 7^,(3). 
We use the following prior: 

p{e) - p{l3, r,^\ct>)- p{4>) - p{(l>)l{ifT, (B-2) 

where a is a constant, taken to be 1 in this study. The "noninformative" prior for (3 and rj^ conditional on 
is a typical choice; see Box and Tiao [1973, sees 1.3, 2.2-2.4, 8.2] and Berger et al. [2001]. The prior for 
the correlation parameters (p is left flexible and carried along in the analytical derivation; the specific choice 
for p{4>) in this study is uniform in a bounded interval (see below). 

With the model (B-1) and the prior (B-2), given point values (a;*,y*), we can derive the posterior in a 
hierarchical fashion as follows: 

f3\rf,cf>,y, ^ N{Q-'xjR-'y,, rf Q'^) (B-3) 

77^ I 0, y* - Inv-x^ (^n + 2a - - 2, {n + 2a-dp~ 2)"^ S^^ (B-4) 

p(0|y.) OC p(0)|H,|-V2|Q|-l/2(^2)-(n+2.-..-2)/2 

where n is the length of data y*, is the dimension of /3, is the design matrix for the data locations 
£c*, is the correlation matrix of Y at the data locations x^, Q = X'^ R'^^ X^,, and = {R^^ — 
R~^X^Q~^Xj R~^^ y^. The symbols iV and Inv-x^ stand for the normal and scaled inverse-chi-square 
distributions, respectively. Derivations of these results can be found in Kitanidis [1986], Gelman et al. [1995, 
sees 2.6-2.7, 3.2-3.4, 3.6], and Diggle and Riheiro Jr. [2007, chap. 7]. 

For illustrations in this study, we use the isotropic exponential correlation function without nugget, that 

is, 

p{xi,X2; 4>) = exp(-]a;i - X2\/X), 

where ] ■ ] is the Euclidean distance and A > is a range parameter, which is the only component of (p. 
The prior of the range parameter A is taken to be uniform on (0.05L, L), where L is the size of the model 
domain. While this support of A is chosen subjectively, there is no need to consider much larger ranges, 
because correlation at long distances is better captured in the mean function [see Diggle and Riheiro Jr., 
2007, sec. 5.2.5], and because the variance and range parameters have an intimate interaction. According to 
De Oliveira et al. [1997], a bounded support for the correlation parameter guarantees a proper posterior. 

To sample from the posterior distribution p{(3, t;^, ^ ] y*), we first discretize the correlation parameters 4> 
(here it is just a scalar A) and pre-compute p(0 ] y*) for the set of (p values according to (B-5), then sample 
<f) from this discretized set with weights proportional to their posterior densities. Given a sampled 0*, we 
sample ryj from the scaled inverse-chi-square distribution p(?7^ ] 0*,y*) in (B-4), and finally sample /3* from 
the normal distribution p{j3 \ ri1,cj)^,y^) in (B-3). 
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Figure 2: The first synthetic forward process and 9 measurements of it. 
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Figure 4: Posterior distribution of the transformed scale parameter. (Labels on the top are the back- 
transformed parameter values.) Curves 1-4 correspond to sample sizes 5000, 10000, 20000, and 40000, 
respectively. 
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Figure 5: Contour plot of the posterior joint density of the transformed scale and variance parameters. 
(Labels on the top and right are the back-transformed parameter values.) 
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Figure 6: Box plots showing the posterior marginal distributions of the inverted anchors (in model unit), 
based on 1000 random draws from the anchors' posterior distribution with respect to type-A data (scenario 
A) and to both type-A and type-B data (scenarios ABl and AB4). The thick gray curve is the actual field 
(in model unit). 
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Figure 7: Same as Figure 6 but in natural units. 
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Figure 8: The 5th, 25th, 50th, 75th, and 95th percentiles of 1000 realizations of the field (in model unit) 
created based on the parameter posterior with respect to type-A data (scenario A) and to both type-A and 
type-B data (scenarios ABl and AB4). The thick gray curve is the actual field (in model unit). 
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Figure 9: Same as Figure 8 but in natural units. 
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Figure 10: Distributions of the forward model output (i.e. reproduction of the type-B data) in 1000 field 
realizations created based on the parameter posterior with respect to type-A data (bottom curve in each 
panel) and to both type-A and type-B data (lifted curves in each panel; with sample sizes 5000, 10000, 
20000, 40000 from bottom to top). The density values (on Y axes) should not be read literally except for 
the bottom curve, due to the shifted stacking-up of multiple curves. The vertical lines mark the true type-B 
data values. 



